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(54) Monitoring of data flow for enhancing network security 



(57) The invention relates to the monitoring of the 
flow of a data stream traveling between a client and a 
server system. The invention is intended particularly for 
such communications protocols canrying representation 
data above some connection-oriented protocol layer. 
The objective of the present invention is to bring about 
a flow monitoring mechanism enhancing system secu- 
rity. This is achieved by analyzing a data stream 



traveling from the server to the client in order to identify 
at least one response descriptor in the data stream. The 
identified response descriptors are stored in a set of 
available states for said client. Then the data stream 
traveling from the clienttotheserver Is analyzed in order 
to identify at least one request descriptor. The request 
descriptors identified are compared with the set of avail- 
able states for said client, and in response to the com- 
paring step, a monitoring result is generated. 
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Description 

Fieid of the Invention 

5 [0001] The invention relates to monitoring the flow of a data stream traveling between a client and a server system. 
The invention is intended particularly for communications protocols carrying representation data that are canled above 
some connection oriented protocol layer. 

BacicgrQund of the Invention 

10 

[0002] Due to the recent continuing increase in Internet usage, the number of potential targets for attacks against 
computer systems by malicious users has exploded. In order to find possible security loopholes in the numerous sys- 
tems attached to worldwide communications networlcs. the parties seeking to break into networked computers have 
developed a good arsenal of powerful tools, which can systemattoally find resources available for suspicious activities. 
IS Together with the skills of the Intruders this has proven to be extremely harmful for network security, even though it 
has given birth to a rapid growth of network security industry battling against the unauthorized use and eavesdropping 
of networked computers. 

[0003] A typical example of a network In which Intrusions and other unsolicited anacks may take place is shown In 
FIG. 1 A. The core network is in this case the internet 102, to which the hacker denoted by client 101 Is connected. On 
20 one side of the internet is a gateway element 103 separating a private network 1 04, such as a corporate LAN or Intranet, 
from the internet. The hosts connected to the Intranet, such as server 105, are the potential targets of Intrustons or 

attacks. 

[0004] Especially many service platfomns running on servers attached to the internet have shown their vulnerability 
to malicious users. As an example, many e-commerce systems used in the World Wide Web (WWW) have been 
25 attacked. This has caused both financial damages and severe problems regarding the misuse of confidential user data, 
such as credit card numbers or other sensitive infonmation. 

[0005] if the service platform does not meet high security standards, it is probable that it can be misused to get 
control over the system, to gain some classified information or at least to cause the system to crash. As an example, 
it can be expected that fixed parameters In a query string coming from a client have the same values as in the original 
30 Hypertext Transport Protocol (iiTTP) stream sent to the client by the server. If an attacker changes the parameter 
values in an unexpected way. the system may start behaving in an odd way. The same applies to the lengths of the 
parameters that the client returns. 

[0006] Web servers may have so-called directory traversal bugs, which notoriously can be exploited to retrieve a file 
not intended for the public. An example of this Is an HTTP request typed in the location window of a web browser, such 
35 as http://www.sltename.net/. ./../. ./etc/password . The server may return the password file of a UNIX system as a re- 
sponse, the file being readily available for future decryption purposes. 

[0007] Other common targets of attacks have been different Common Gateway Interface (CGI) binaries and scripts 
processing Hypertext Markup Language (HTML) fomns. CGI is the standard for the environment passed by a server 
to scripts used in the WWW. Many CGI scripts have bugs, which may be directory traversal bugs or something else, 

40 8uch as shell-related and buffer overflow bugs. 

[OOOS] Measures developed for tackling and/or nothing the security vulnerabilities or exploits in servers are, for 
example, security scanners and network Intrusion detection systems (NIDS). Using some security scanners available 
it is possible to perform a vulnerability analysis of the server, whk;h can detect the vulnerable parts of a service. In the 
analysis the CGI scripts within the system are scanned for known vulnerabilities. A NIDS is a silent listener, whteh 

45 monitors t raff to flowing in the network and generates an alarm when something suspicious is detected in the traffic. 
The NIDS looks for regular expressions, and the matching Is usually done for Ethemet frames, IP packets, or for the 
TCP stream. Fingerprints of known harmful IHTTP requests may be used in tooking for suspicious parameter values. 
Typically, HTTP requests and responses are checked as corresponding pairs, because one TCP connection typically 
carries one HTTP request response pair. The problem with using fingerprints is that they are vulnerable to even slight 

50 modifications of the request pattern used, in addition, the method is computationally heavy, as multiple fingerprints are 
needed when analyzing every single request. This has an adverse effect on the overall performance of the system. 
[0009] An HTTP proxy which can filter the request-response pairs Is another security solution. Unfortunately, it has 
the same limitations as the other solutions discussed above. FIG. 1 B shows a simpliflad example of the generic action 
of a prior art HTTP proxy-gateway 151 . The user agent 1 50 wishing to send a request to the web server 152 sends a 

S5 request to the proxy. The proxy may cache some information, such as static web pages, in order to reduce the load 
caused to the server by incoming requests. The user agent, such as a web browser, first sends a TCP SYN packet 
1B1 to the proxy in order to establish a connection. If the proxy allows the connection, it answers by sending a TCP 
SYN+ACK packet 162 In response. The user agent receives this and answers with a TCP ACK packet 183, thus 
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finalizing the TCP three-stage handshake procedure. The host ID may be checked from the IP frame on which the 
TCP packets are carried. SImllariy. the client ID may be verified from the TCP headers; such infomiation may include 
target and destination port and so on. The user agent is ready to send an HTTP request 1 84, for example, a request 
for a web page containing infomnation in a presentation language such as HTML. 

5 [0010] The proxy opens similarty a TCP connection to the web server by sending messages 185-187, providing that 
the response for the request is not cached in the proxy, as it is, for example, when the HTTP data, such as HTML 
content, has some dynamic parts, such as fomris or scripts. The proxy then sends the HTTP request 188 to the web 
server. The request 188 may essentially be similar to the original request 184. The server responds to the request by 
sending an HTTP response 189 to the proxy. The proxy then forwards the response to the user agent as a new HTTP 

10 response 190. The user agent may close the TCP connection by sending the TCP FIN+ACK packet 191 , for whfch the 
proxy answers by sending first a TCP ACK paclcet 192 and then a TCP FIN+ACK packet 193. The proxy then closes 
the connection to the web server by sending a TCP FIN+ACK packet 1 94. The web server answers by sending first a 
TCP ACK packet 1 95 and then a TCP FIN+ACK packet 196. 

[001 1] Fingerprinting does not suit well for detecting attacks against the dynamic pages used widely In modern in- 
'5 formation systems, it is also possible that the connection between the user agent and the HTTP proxy persists. This 
means that the user agent does not close the connection between the HTTP proxy and the user agent after receiving 
the HTTP response 190, but may send further requests to the proxy. 

[0012] None of the prior art solutions described above is able to detect unforeseen anacks against CGI binaries 
utilizing forms or scripts, for example. The prior art solutions also perform poorty In guarding the sen/er against user- 
20 induced buffer overflows of fields of statk; length, such as the hidden seiect^option fields used In many HTML forms. 
In other words, the prior art solutions fail to protect a server against unsolicited attacks and misuse attempts, made 
either purposely or by mistake In cases when the misuse Is performed on a legal system port using formatly correct 
queries but utilizing the weaknesses of the system. 

^5 Summary of the Invention 

[0013] The objective of the present invention is to bring about a flow monitoring mechanism enhancing system se- 
curity. This Is achieved by using a method, system, or computer program product described in the Independent claims. 
[0014] The present Invention relates to the monitoring of a data stream traveling from a client to a server. The data 

30 Stream includes representation data carried on a connection-oriented carrier protocol. The representation data pref- 
erably utilizes some other protocol than the carrier protocol, and may be a stateless protocol. In the monitoring process, 
a data stream traveling from the server to the client is analyzed in order to identify at least one response descriptor In 
the data stream. Identified response descriptors are stored in a set of available states for the client. Next, a data stream 
traveling from the client to the server is analyzed in order to identify at least one request descriptor. Identified request 

33 descriptors are connpared with the set of available states for said client, and responsive to the comparing step, a 
monitoring result Is generated. 

[0015] In accordance with one aspect of the present invention, the monitoring may further include performing a 
predenmined action at least partially based on the monitoring result. 

[0016] In accordance with one aspect of the present Invention, said predetermined action may Include matching the 
^ data stream against known misuse patterns, If at least one request descriptor fails to match the stored response de- 
scriptors In the set ot available states. 

[0017] In accordance with one aspect of the present invention, said predetennined action may include allowing the 
data stream passage If the request descriptors match the stored response descriptors in the set of available states, 
and restricting the data stream if at least one request descriptor falls to match the stored response descriptors In the 
^3 set of available states. 

[0018] In accordance with one aspect of the present Invention, said predetermined action may include generating 
an alamn event, whteh is selected at least partially based on the nwnitoring result. 

[0019] In accordance with one aspect of the present inventk>n, in the monitoring process a host identifier from the 
carrier protocol part of the first data stream between the client and the seiver may be stored, and then used in order 
30 to select the set of available states for said client. The host identifier is used for identifying available states for any 
client of said host. 

[0020] In accordance with another aspect of the present invention, this may further be accomplished by storing a 
client identifier from the first data stream between the client and the server. The stored client identifier is then used to 
identify the set of available states for said client. 
55 [0021 ] In accordance with another aspect of the present Invention, the client Klentlfier can be compared to the client 
identifiers stored in order to select the set of available states for said client. If the result of the comparison Is that the 
client identifier does not correspond to the client Identifiers stored, the set of available states for said client is selected 
from the set of available states for said host. 
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[0022] In accordance with one aspect of the present Invention, in the monitoring process the analyzing of a data 
stream traveling from the server to the client includes analyzing different possible states of macro and/or fomi def in itions 
included in the data stream in order to identify at least one response descriptor. 

[0023] In accordance with another aspect of the present invention, the analyzing of different possible states may 
5 further include executing the macro file in the system in order to identify at least one response descriptor. 

[0024] The present Invention suits especially well for protecting or tightening systems utilizing protocols, such as the 
HTTP or the Wireless Application Protocol (WAP), carrying representation data, such as the HyperText Markup l-an- 
guage (HTML), Wireless Markup Language (WML), Extensible Markup Language (XML), or XML based representation 
languages, such as WSD17SOAP (Web Services Description Language/ Simple Object Access Protocol). The protocol 
10 carrying representation data may be a stateless protocol if the client can be identified in a straightfonvard manner, 
such as is the case when the TCP is used as a carrier protocol. When applied to these protocols, the available states, 
which are first detected from the data stream traveling from the server to the client, may correspond to the requests 
available. If the data stream is such that also more complex possibilities are avaflable, some of the available states 
can be computed in advance and stored in a state table conresponding to the set of available states. Thus the analysis 
f 5 is performed on a response-request pair as opposed to the request-response analysis carried out in prior art . I n addition, 
the invention teaches a method of using infonnatlon obtained from a response transmftted within one TCP connection 
for the purpose of analyzing a request transmined within another TCP connection, Instead of the prior art analysis of 
a request-response pair of a single connection. 

[0025] In order to facilrtate the monitohng of the data flow, In one embodiment of the present invention the first query 
20 from the client to thesen/ertrlggers the updating of aconnection table. The connection identifier may further be detected 
to identify the session between the client and the server and thus enhance the efficiency of the system by faster 
correlation of the request-resportse pairs. 

[0026] The present invention may be implemented in a NiDS or a proxy-firewall, for example. Thus, the invention 
provides new functionality to these prior art systems. 



30 



45 



Brief description of the drawings 

[00271 '^^6 Invention is described more closely with reference to the examples shown in FIG:s 2-6 of the accompa- 
nying drawings, In which 



FIG, 1 A shows simplified architecture of a typical network environment where HTTP proxies are commonly used, 
FIG. 1 B illustrates the operation of a prior art HTTP proxy, 

FIG. 2 is an exemplary functional block diagram showing how the present invention applied to a data stream 
moderator works, 

35 FIG. 3 shows how the present Invention as realized in FIG. 2 works for HTTP traffic. 

FIG. 4 illustrates how the data stream moderator in FIG. 2 operates when detecting a data stream of a response 

traveling from the server to the client, 
FIG. 5 illustrates the function of the ANALYZE RESPONSE-biojcIt as shown in FIG. 3 in more detail when applied 

to an HTTP stream, 

40 FIG. 6 shows how the data stream moderator in FIG. 2 can operate when the present invention is applied to an 
HTTP stream, and 

FIG. 7 d^icts how the set of available states in FIG. 2 may be initiaiized for a new client. 



Detailed description of the Invention 



[0028] In the foliowing description, the Invention Is described In connection with an HTTP-proxy In order to better 
illustrate the operation of the invention. This is not meant to be restrictive, however, as the invention may be employed 
in any other suitable network devtoe as well. The present invention is described in more detail with inference to PIG. 
2. A data stream coming from a client, I.e. from the user agent 160, is first received in a receiving block 202. The data 

so stream may correspond to any traffic traveling In the Intemet, but further on It Is assumed that at least a part of it 
corresponds to some protocol used for carrying representation data, and which may be in a stateless protocol format, 
such as HTTP. The receiving block in this example may be located In a proxy. The data stream may consist of paclcets, 
which may be collected to form a complete request In the receiving block. When the receiving block receives TCP/IP 
packets, it can extract destination and host infonnation, such as the network addresses and ports, from the header 

55 parts of the packets. Generally the HTTP is canied above the TCP layer, and requests or responses no longer contain 
this information. 

[0029] The receiving block 202 forwards the data stream received, together with the host Infonnatton. to the control 
block 204. The control block performs buffering of the receh/ed data, but more interestingly, If the data is intended for 
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some HTTP port in the server 152, it forwards tlie data to an inspection block 206 together with the host infomiation. 
The control blocl< also receives validatecl data from the inspection block to be fed to a forwarding block 208. The 
fonvarding block fonvards the data stream to the server in a manner described above. 

[0030] The inspection block has a host ID recognition block 220, whk:h together with a client recognition block 222 
5 can check whether the client has communicated with the server before. The host recognition block checks If it can 
Identify the originating host on the basis of the host infomnation. The client recognition block checks If It can identify 
any client identifier in the data stream. Such client identifiers include identification cookies or other infomiatlon, which 
have been preconfigured by a system administrator. If such preconfigured Information does not exist, other identification 
information, such as the IHTTP User Agent header value, can be used. The client identifier may be used to identify the 
f 0 client. Further, the client identifk:ation block may include a part for recognizing other common parameters. Common 
parameters are parameters which are common to all connections/requests originating from one client. Such common 
parameters may be, for example, the HTTP Host header value. I.e. the name of the target server, and then some further 
cookie references. 

[0031 ] The host and client Information are stored In a table 250 together with the common pareuneters detected. The 
IS request parser 232 parses the request, and may operate as described In more detail below. The limitation check block 
234 verifies that the parsed parameters in the request are within the defined limitations, which are read from the limi- 
tations table 252. The corresponding recognition blocks 220 and 222 check if the Infonmation exists in the table 250. 
If the corresponding entry is not found, they forward the data stream to a fingerprint matcher block 230, which checks 
whether the HTTP data stream contains parts resembling known attack patterns, i.e. known fingerprints 253. If the 
20 fingerprint result is negative, i.e. no attack is detected, a new entry is made In the table 250. In this way it can be at 
least partly verified that a new client is not attempting an attack. In the opposite case, the data stream may be stored 
in memory in a table 254 containing invalid parameter values, and then forwarded to an event analysis and reporting 
block 240. As may be noted, the parsing and limitation checking steps may be performed either before or after the 
operations performed in steps 220 and 222, becauso it is dosirable to parse all requests in order to ir^pectthem in 
2s more detail. The limitations table 252 may, for example, contain allowed parameter value ranges for different CGI 
binaries. These limitations may depend on each CGI binary file, so the limitations described in the limitations table 252 
may be separately defined for each CGI binary and each script. 

[0032] Then descriptors defining the content of the request are identified in the descriptor Identification block 226 
from the parsed request. The identification of descriptors is discussed below in more detail with reference to FIG. 5. 

so The descriptor verification block 224 reads the aval table states for said cRent from a table of available states 251 , which 
contains descriptors. I.e. what kind of legit! mateA^a lid requests may be coming from the client. The descriptor verification 
block further compares the descriptors in the request to the descriptors in the table of available states. Basically the 
table of available states is created for each client separately, but It could as well be a host-based table, in which the 
available states for different clients from the same host are collected. Either the request parser, the descriptor identi- 

35 fkstlon block, or the descriptor verification block can have the functionality of mapping at least some part of the request 
into a more generic format, i.e. expressions with syntax such as %20 can be replaced with a space, and so forth. 
[0033] It will be noticed in the verification result analysis block 228 whether any Invalid parameter values or descriptors 
are detected. The verification result analysis block may have a rule base, on the basis of which the verification result 
analysis block can classify failures to meet the available states and/or common limitations. At least the cases for which 

40 the failure is classified as serious or hazardous are fonArarded to the event analysis and reporting block 240, Instead 
of the data stream being passed to the fonwarding block 208 to be fonA^arded to the server 152. 
[0034] The event analysis and reporting block may read the Invalid parameter values file 254, which may also contain 
a classification made by the verification result analysis block 228, according to a rule base it it has been defined. The 
event analysis and reporting block may get the data stream as a whole as well, with Infomiatlon from the host and 

45 client Identifiers too . The events are fu rther classified and stored in the cl assif led events file 256 , and the events serious 
enough to be reported are recorded In the reported events file 255, which may Include means for visually or audlally 
outputting an event occurrence code for administrative purposes. The event anatysis and reporting block may further 
record all communications permanently as evidence. 

[0035] The data stream traveling from the server 152 to the user agent 150 is analyzed as well. For this purpose, it 
so is possible to use an inspecting block 266. It is evident that the functionalities of the Inspecting block 266 may be 
incorporated in the inspection block 206, which in this way could take care of inspecting traffic traveling in both direc- 
tions. The solution described utilizing different inspection blocks for the different directions is used only for load sharing 
purposes. In this way it is easy to distribute the tasks of the different inspection blocks to different processors or com- 
puters. 

55 [0036] The receiving block 262 receives the data stream coming from the server It forwards it to the control block 
264, whteh operates in much the same way as the control block 204 does. The main difference is that now the different 
parts of the TCP and IP headers are used, as It is the destination port and address which define the host running the 
user agent application to which the traffic is traveling. Instead of the source information used for the traffic going In the 
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opposite direction. 

[0037] The control block forwards the content HTTP stream to the inspection block 266. First the host ID is recognized 
from the data stream in the corresponding host ID recognition block 272. Then the dlent ID Is identified In the client 
10 recognition block 274. If the response Is representation data, the response is parsed In the response parser 276. 

5 The available states and limitations offered by the response are identified in the descriptor identtflcatton block 276. if 
the response includes scripts or forms, the descriptor Identification block may further inspect them in order to find 
possible limitations and Illegal parameter combinations. If something is found, the data can be stored in the limitations 
file 252. If the response includes scripts, for example, it may be necessary to execute them before the available and 
illegal parameter values and/or limitations can be noticed. 

10 [0038] The available states Identification block 280 reads the table of available states for the client 251 . The correct 
table Is selected on the basis of the host and client IDs, whtoh can link the identities to the available states table in the 
table for host and client IDs and common parameters 250. The available states Identification block then compares 
descriptors receh/ed from the descriptor identification block with the stored available states 251 and removes duplicates 
from the available states detected. Then it stores the new available states into the table of available states. The for- 

75 warding block 268 receh/es from the control block 264 the data stream containing also the response date and sends 
it to the ciient. 

[0039] The functional block diagram of FIG. 2 suits tor a NIDS as well. The main difference In that case Is that the 
forwarding and control blocks are not needed, as a NIDS Is a silent listener and only monitors network traffic, 
[0040] FtG. 3 illuslrates data flow between the user agent 150 and the web server 152, discussed below in more 
detail. The user agent sends a TCP SYN packet 310 to the web server in order to establish a connection. The server 
responds by sending a TCP SYN+ACK packet 31 2 in retum. The web server receives the packet 312 and answers by 
sending a TCP ACK packet 314, after whk:h the TCP connection is fully established. 

[0041] The open connection may be used by the client, for example, for sending an HTTP request 316 to the web 
server. The web server analyzes the request and responds with an HTTP response 318. If the HTTP connection is 

^5 non-persistent, the ciient closes the underlying TCP connection by sending a TCP FIN+ACK packet 320, for whteh the 
web sen/or responds by sending first a TCP ACK packet 322 and then a TCP FIN+ACK packet 324. 
[0042] To send a new HTTP request the client has to open a new TCP connection. This is done similarly by sending- 
a TCP SYN packet 326, receiving a TCP SYN^-ACK packet 326, and further by sending a TCP ACK packet 330 before 
the new HTTP request 332 may be sent. After receiving the response 334, the client closes again the TCP connection 

30 bysendlngaTCP FIN+ACK packet 336, which isconfimiedbytheserver responding with a TCP ACK 338 and TCP 
FIN4^ACK 340. 

[0043] If an HTTP proxy 151 is used between the client and the server, the messaging corresponds to the process 
already discussed with reference to FIG. IB. To Implement the present Inventwn for monitoring the traffic between a 
server and a client, however, it is not necessary to use a web proxy. The traffic is analyzed by the analysis system 
35 which may be located in a proxy or in some other network element. The element may be a part of a network intrusion 
detection system as well. 

[0044] If a proxy system is used, the analysis of the response-request pairs according to the present Invention would 
be carried out for the request 184 and response 190, and so on forfurther communteatlons as well, or alternatively for 
the traffic between the proxy and the web server. In the latter case the request-response pairs have to be identified in 

40 a different way, because the originator of the packet received by the web server may indeed be the proxy and not the 
user agent. Hence, the host information cannot be used for identifying a host, but the Invention is implementable with 
a proxy system as well, since the client identifiers are still usable. Moreover, if identification cookies or some other 
identifying mechanisms are used, different clients can be Identified without the host ID. In general, Information contained 
in the request 316 is passed to the analysis system 300. The first inspection block 206 checks the host and client 

45 Identifiers and possible common parameters, and stores them in the con-esponding table 250. It may also pertomn a 
limitation check and a fingerprinting procedure in order to find malicious requests, classify and report events, etc., as 
described above with reference to FIG. 2. The first Inspection block 206 analyzes the request in Its descriptor Identifi- 
cation block 226. If the request is the first one, it can only be checked against fingerprints and the limitations file 252. 
[0045] The response 318 is analyzed as well. The second inspection block 266 performs tasks similar to the first 

so Inspection block 206 In the sense that it analyzes the HTML part of the stream and tries to identify possible descriptors 
and limitations. The descriptors identified are stored In the table of available states 251. Similarly, the llmltattons kJen- 
tlfied are stored in the limitations file 252. 

[0046] When the analysis system receives the next HTTP request coming from the same host, the request is also 
analyzed in the first Inspection block 208. Because the first response destined for the client has already been received, 
S5 the inspection block now has more information, since the second Inspection block 268 has already stored some state 
infomnatfon in the table of available states 251 . This Information may be used to verify the descriptors identified in the 
request 332. In this sense one could understand the process as validation of the requests, but this might be a bit 
misleading in that if a network Intrusion detection system Is in use, the trafTic does not need to be validated but only 
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suspicious activities need to be identrfied. 

[0047] The system continues the operation in a similar way. In other words, the future requests are analyzed against 

all historical knowledge about descriptors Identified in the responses during the connection. 

[0046] FIG. 4 shows the analysis task of the inspection block 266. which is used for inspecting responses sent from 

5 the server to the user agent When the inspection block receives the traffic (step 400), the data containing the IHTTP 
response and host identity is read in step 402. Basically, blocks 272, 274, 276, and 278 may be involved in different 
parts of this process. The inspection block verifies (step 404) that the response is in a presentation language, such as 
the HTML. The HTTP responses may also include data in binary fonnat, such as images, sound, or other multimedia 
information. These parts of the responses do not necessarily need to be analyzed. If the response consists wholly of 

10 such data, the process is terminated (step 410). Otherwise, at least the host Identity part of table 250 is read (step 
406} in order to retrieve the available states Information correctiy, but this step may also Include reading the client 
identity table and other common parameters. 

[0049] After this the request is decoded and parsed in step 407. The parsing is described in more detaiJ with reference 
to FIG. 6. The analysis step 408 analyzes the parsed response in more detail, after whtoh the processing of the response 

15 is terminated (step 41 0) in the inspection block 266. 

[0050] FIG. 5 illustrates how the response is analyzed in step 408. First the table of available states is read in step 
502. From table 251 , the entries con^esponding to the available states are selected on the basis of the ciient identity. 
Alternatively, this can be implemented so that there is a separate table reserved for each client or host Identity. 
[0051] After step 502, the content of the response is examined. In other words, possible descriptors corresponding 

20 to the available states are identified from the response in order to later examine the requests. In step 504 possible 
HTTP REDIRECT messages are Identified, and then the parameters of these messages are added as descriptors to 
the set of available states under examination. For example, a redirect message might infomn that an URL (Universal 
Resource Location ) http : //www. stonesoft.com/8erv ices/ is indeed located at http-y/www.stonesoft.com/customer/serv' 
ices/ . 

25 [0052] In step 506 the states given in the possible HTML HREF tags are identified. These correspond to hyperlinks 
and are commonly identified in a stream like 
<a href= "/contact. html ">Contact</a> 

where the HREF part identified has been emphasized with a bold font. If HREF identifiers are found, they are added 
as descriptors into the set of available states under examination. That is, if the following request from the client Is a 
3o GET request for /contact.html. the request is included in the available states and the request is therefore legal. 

[0053] In step 508 analysis of possible forms in the response rs perf onrked. An example follows of a fonn part in the 
response, the form part being notified by the FORM tags, denoted with a bold font: 

35 <HTMtj> 



40 



so 
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<£orjii actions *» /sites/search. exe" inethod="GET'' > 
<input type=hidden naine=»prof ile value="user"> 

5 

<input type«text name^query slze»50 maxlengths:800> 
<SELECT NAME=lemguage> 
10 <OPTlON VAIjDE«XX>any language 

<OPTlON VALUE-en>English 

<OPTIOW VAIiUE=zh>Chinese 

IS 

<OPTION VALUE^f i>Finnlsh 
<OPTION VAIilJE==fr>French 
so <;OPTION VAIjtJE=de>Gennan 

<OPTION VALXJB=sv>Swedish 
</SELECT> 

<input type^siibmit name-search value«" Search "»> 
</£orxii> 
</HTML> 



S5 



30 



40 



[0054] Here the available options , parameter length limitations, and other ralavant information are stored as descrip- 
tors in the set of available states and limitations under examination. The available states for the resource /sites/search, 
exe for the client in question obtained from the above response are: 

method = GET. 
profile = user, 

language = XX, en, zh, fi, fr, de or sv, and 
search = Search. 



That is. the future requests from the client in question need to conform to these available states. The limitations for the 
resource /sites/search. exe, which concerns all hosts and clients obtained form the above response are: 1} the length 
of the query parameter value Is less than or equals to 800. and 2) the length of the language parameter value equals to 2. 
That is, ail legal requests from any client to the site in question need to conform to these limitations. 

4s [0055] If the response consists of scripts, they are analyzed In step 510. This is a more complicated task, because 
It Is possible that the script action cannot be deduced merely tyy analyzing the contents of the script data. Hence, It 
may be necessary to execute the script in order to find out what kind of responses it is able to generate so that the 
corresponding response descriptors can be identified. The script Is analyzed In a processor, which may be the one the 
Inspection h\ock 266 utilizes, or the script analysis may be performed in a separate analysis processor. The results of 

50 the an alys Is, which correspond to the descriptors identified, are stored in the set of available states under examination 
as well. 

[0056] After the available descriptors In the response have been analyzed, they are compared with the available 
states read from table 251 in step 502. If there are any descriptors that are not already contained in table 251 for said 
user agent, tfiey are updated in the table (step 514). In orderto do this efficiently, possible duplicates (i.e. states already 
55 identified and stored in the table) must be detected so that only the new available states are stored (step 512). 

[0057] FIG. 6 represents the handling of an HTTP request such as requests 316, 332. or any subsequent request 
in the first inspection block 206. First, the Inspection block receives the data stream in step 600. This stream may 
include some compressed or encoded parts, so they must first be decompressed and decoded. After this the request 
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is ready to be parsed. These steps are performed In the dacode and parse HTTP request step 602 In the request 
parser block 232. 

[0058] After this the first inspection block has to read the host identifier. This information may be obtained, as de- 
scribed above, from the IP and TCP headers, and it is possible that the inspection block simply receives this data from 
5 the control block 204 readily extracted. If the host identifier received already exists in table 250, as is checked in step 
604, the communication between the user agent 150 and the web server 152 corresponds to request 332 or to any 
subsequent request for which responses like 316 have already been received. In the opposite case, which will be 
discussed first, the request corresponds to the first request 316 from the user agent or host. The analysis is therefore 
branched in step 604 or 606. 

10 [0059] If the host identifier is not found, the request descriptors from the parsed request are checked (step 61 1) on 
the basis of the limitations table 250.Thl8 may include, for example, parameter limitations related to server's IP address, 
host name, and absjsath part of the URI (Universal Resource Indteator) excluding the query part of the request. Then 
the request stream Itself is compared (step 612) against fingerprints from known misuse attempts. This step employs 
normal pattern recognition methods, such as comparing the stream to some previously stored strings corresponding 

19 to parts oT the requests. The comparison may be performed in the fingerprint matcher block 230. 

[0060] Step 61 4 evaluates the results of the two previous analysis steps 61 1 and 612. If the result of the comparison 
is that the known misuse patterns are detected or If In step 611 It was noticed that descriptors defining Illegal parameter 
values etc. were encountered, the event may be classified in the event classifteation step 616. This verlfk^ation and 
classifk:ation may be done in the event analysis and reporting block 240. Then, if necessary, an alert is generated in 

^ step 616, and the request is blocked, i.e. It is not forwarded to the web server. This last step may take place in the 
proxy or in the IHTTP shield program in the server, but probably not in any network Intrusion detection system. The 
NIDS generates an alert instead, which helps the system administrator to Identify possible problems. 
[0061] In the opposite case, I.e. when no known misuse pattems are detected, the host identifier Is updated in the 
connection table 250 in step 620. Similarly, the client identifier may be updated (step 622) in the table. If any common 

25 parameters are found In the request, they are identified in step 624 and further updated in table 250 in step 626. 

[0062] After these actions the handling of the first request has been perfonned (step 660), and the request is ready 
to be forwarded to the web server. 

[0063] If In step 604 it is detected that the host identifier already exists In table 250, a client Identifier of the request 
is processed. In step 606 it is verified whether the client identifier already exists in table 250. Table 250 is thus used 

30 to detect whether the user agent has already been in communication with the server. This can be defected, as explained 
above with reference to FIG. 3, because the TCP connection can be temnlnated after each request- response pair, as 
opposed to the case of persistent HTTP connections, for which the TCP connection remains between subsequent 
requests. Hence, table 250 can map the response/request descriptors for the user agent identities to span over the 
communications history, which includes many different connections. This principle Is used when implementing the 

35 invention, i.e. response-request pairs are mapped to bridge over subsequent TCP connections. 

[0064] if the client identifier is not found, then in step 606 the descriptor verification block 224 checks whether the 
request descriptor can be found on any of the available state tables 251 related to the host Identifier. If no matches 
are found, whch will be ascertained in step 610. the request will be handled as a new request, thus forwarding It to 
limitation check step 611, which takes place in block 234. Next, the request is forwarded to the fingerprint matcher 

^ block 230, which perfomns the fingerprinting step 61 2. The rest of the steps are also perfonned as for a new request, 
if a descriptor conesponding to another client Identifier from the same host is found, the client identifier from the request 
is mapped to the found client identifier for future processing and the process Is temnlnated in step 660. This corresponds, 
for example, to a case In which the user has simply changed the web browser from Microsoft Internet Explorer to 
Netscape and copied the URL from the previous browser screen onto the Netscape location field. 

^ [0065] If the host and client identifiers are found, the processing is continued in step 650. In this step the common 
parameters are Identified from the HTTP headers, in step 652 they are compared to the values retrieved from table 
250, and if there are any anomalies, such as Invalid parameter values detected, the next step 616 is to continue with 
an event classification procedure as already explained. This may correspond, for example, to a situation in which 
someone is typing the HTTP requests directly using Telnet. In general, this may be considered suspicious. Alternatively 

50 (not shown In figure), fingerprint analysis as explained above may be done for the request and the request allowed if 
no anomalies are found. 

[0066] In the opposite case, the possible limitations, such as for the scripts, are retrieved from the limitations file 252 
in step 654, and the descriptors klentified in the previous responses are read from the table of available states 250 in 
step 656. The parsed descriptors in the request are then further compared with the retrieved limitations and descriptors 
55 in the step 658. If the descriptors match those in the set of available states as combined from the limitations and the 
descriptors, the request appears to be valid, and the process may be ended (step 680). In the case of a web proxy, 
the request may be passed to the web sen/er, and in the case of a NIDS no operation Is needed. 
[0067] In the opposite case. i.e. if at least one of the parameters Is not valid, it Is possible that this con-esponds to 
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an event. For this reason the possible event is checked In the event classification step 616. Further, if there is an event 
which corresponds to something alarming, such as a hacking attempt, an alamn may be generated (atep 618). After 
this, the processing is ended (step 680). tt Is possible that in the event classification step it is detected that the modi- 
fications in thedescriptors are not hamtf ul. In this case it Is possible to allowthe forwarding of the request if the invention 

5 is implemented in a proxy element. 

[0068] FIG. 7 shows how the table of available states is initialized when a new client and/or host is observed for the 
first time. The procedure described here is additional, and it may be skipped as well, but this enhances the system 
performance and reduces extra processing of future requests. After step 626 and before the request is forwarded, the 
system may read the configured states in the memory (step 710). These configured states may include some known 

10 limitations for the services or states which are known to cause false alarms If not specifically allowed etc., which are 
statk: in their nature and common for most clients. These descriptors are then copied in the initialized states table 251 
in step 71 2. An alternative for this Is, of course, that the descriptors defining the configured states are stored in a file 
accessible for the inspection block 206. so that this file Is read when processing requests from all clients. The latter 
solution complicates the system a bit, but possible changes in the configuration may be put in effect more quickly. 

15 [0069] One further embodiment of the present Invention concems how the computer system is implemented. For 
load balancing purposes and scalability, one processor or one server system may not seem sufficient when high network 
loads are expected. For this reason, the system may be implemented In a computer cluster, in a manner such as Is 
described below. 

[0070] The system may comprise a plurality of processing units for kJentifying the request and/or response descrip- 
^ tors, and the analysis is taken care of in one processing unit. The processing unit for parsing the request, may be 
selected according to some predefined criteria. These predefined criteria can be, for example, the IP addresses in data 
packets carrying the requests and responses, the host identifier, the web service used, the current load in one process* 
ing unit, and so on. The parsed requests and/br responses from a plurality of processing units are sent to a processing 
unit handling the analysis. Comparison of the request descrlptons with the table of available states of the client is thus 
25 performed in the same processing unit. 

[0071 ] If the system capacity is further enhanced, more processing units can be designated for the comparing step 
as well, but this is not as straightforward as using multiple parsing blocks, since all requests and responses of one 
host need to be processed In the same comparing block in order to have all required information for the analysis. The 
processing unit with whk:h the request Is analyzed may be selected according to some predefined criteria. These 
30 predefined criteria can be the host Identifier, the web servtoe used, the current load In one processing unit, and so on. 
[0072] The order in whk^h the processing steps (FIG. 4-7) are perfonned is merely illustrative and Is intended in no 
way as restrictive in the implementation of the Invention. Due to the nature of the Invention, only one possibility has 
been shown, but an educated reader skilled in the art may end up having a similar effect by slightly changing the 
processing order. 

3s [0073] Even though the present invention has been explained at>ove using a proxy system as an example, it is to 
be understood that a network proxy is not the only network element in which the invention could be implemented as 
such. For example, a network intrusion detection system (NIDS] may include elements similar to the Inspection blocks 
206 and 266, together with the relevant data files, as well as the event analysis and reporting block 240. The NIDS 
monitors network traffic and tries to detect posstole Intmslons. It knows the status of connecttons, and If it Is listening 

^ to the network traffk:, It may detect a data stream which is not altowed. As this may be a precursor sign of an approaching 
Intrusion, it may be immediately reported, but usually the NIDS starts to foiiow the connection in question in more detail. 
[0074] A stateful firewall capable of performing the filtering operations described above Is also a suitable location for 
Implementing the present invention. 

[0075] The invention applies equally well to systems other than TCP/IP networics in connection with data streams 
^9 utilizing IHTTP. As one example, WAP (Wireless Application Protocol) carrying presentation language content such as 
WML, Is an applicable target for Implementing the Invention as well when the traffic Is run over normal circuit-switched 
or packet-switched data, such as GPRS. As another example, systems utilizing XML or some other presentation lan- 
guage apply as well, hereby Increasing the scope of the inventk>n. Hence, the invention Is not to be Interpreted as 
limited by the description but is to be understood as described In the Independent claims. 

so 

Claims 

1 . A method for monitoring the flow of a data stream between a client and a server, wherein the data stream is carrying 
55 representation data on a connection-oriented can'ler protocol, characterized In that the method Includes the steps 

of: 

analyzing a data stream traveling from the sender to the client in order to identify at least one response de- 
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scriptor in the data stream, 

in response to the analyzing step, storing response descriptors identified in a set of available states for said 
client, 

analyzing a data stream traveling from the client to the server in orderto identify at least one request descriptor, 
5 . comparing said request descriptors with said set of available states for said client, and 

in response to the comj^aring step, generating a monitoring result. 

2. A method according to claim 1 , characterized In that the method further Includes the step of perfonrting a pre- 
determined action at least partially based on the monitoring result. 

10 

3. A method according to daim 2, characterized in that said action includes nnatching the data stream against known 
misuse patterns, if at least one request descriptor fails to match the stored response descriptors in the set of 
available states. 

15 4. A method according to claim 2, characterized In that said action includes allowing the data stream passage if the 
request descriptors match the stored response descriptors in the set of available states, and restricting the data 
stream If at least one request descriptor falls to match the stored response descriptors In the set of available states. 

5. A method according to claim 2, characterized in that said action includes generating an alann event, which is 
20 selected at least partially on the basis of the monitoring result 

6. A method according to claim 1 , characterized In tliat the method further includes the steps of: 

storing a host identifier from the carrier protocol part of the first data stream between the client and the sender 
25 and 

using said stored host identifier to select the set of available states for said client. 

7. A nrtethod according to claim 1 . characterized In that the method further Includes the steps of: 

3o - storing a client identifier from the first data stream between the client and the server and 

using said stored client identifier to select the set of available states for said client. 

8. A method according to claims 6 and 7, characterized in that the method further includes the steps of: 

35 - comparing the client Identifier of the data stream to the client Identifiers stored in order to select the set of 

available states for said client and 

selecting the set of available states for said client from the set of available states for said host, if the result of 
the comparing step is that the client Identifier does not con^espond to the client identifiers stored. 

40 9. A method according to claim 1 , characterized in that the analyzing step of a data stream traveling from the server 
to the client Includes analyzing different possible states of macro and/or form definitions included in the data stream 
for identifying at least one response descriptor. 

10. A method according to claim 9, characterized In that the analyzing step of different possible states includes 
45 executing the macro file in the system in order to identify at least one response descriptor. 

11. A system for monitoring the flow of a data stream traveling from a client to a server, wherein the data stream Is 
carrying representation data on a connection-oriented carrier protocol, characterized in that the system includes: 

50 - analyzing means adapted to analyze a data stream traveling from the server to the client In order to identify 

at least one response descriptor in the data stream, 

storing means responsive to the analyzing means, adapted to store response descriptors identified in a set of 
available states for said client, 

analyzing means adapted to analyze a data stream traveling from the client to the server in order to Identify 
55 at least one request descriptor, 

comparing means for comparing said request descriptors with said set of available states for said client and 
means responsive to said comparing means, adapted to generate a monitoring result. 
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12. A system according to claim 1 1 , characterized in that the method further includes execution means for performing 
a predermined action at least partially based on the monitoring result. 

13. A system according to clain) 12, characterizod in that said execution means include means adapted to corr^are 
5 the data stream with known misuse pattems, If at least one request descriptor fails to match the stored response 

descriptors In the set of available states. 

1 4. A system according to claim 1 2, characterized in that said execution means Include means adapted to allow the 
data stream passage, If the request descriptors match the stored response descriptors In the set of available states, 

10 and means adapted to restrict the data stream, If at least one request descriptor fails to match the stored response 

descriptors in the set of available states. 

15. A system according to claim 12, characterized In that said execution means include means adapted to generate 
an alarm event, including means to select the event at least partially based on the monitoring result. 

15 

16. A system according to claim 11 , characterized In that the system further includes: 

storing means adapted to store a host identifier from the canler protocol part of the first data stream between 
the client and \he server and 

^ - selecting means, adapted to use said stored host identifier to select the set of available states for said client. 

17. A system according to claim 1 6. characterized In that the system further includes: 

storing means adapted to store a client Identifier from the first data stream laetweon the client and the server and 

- means adapted to use said stored client identifier to identify the set of available states for said client 

18. A system according to claim 17, charecterlzed In that the system further includes: 

comparing means adapted to compare the client identifier of the data stream to the client identifiers stored in 
^0 order to select the set of available states for said client and 

- selecting means adapted to select the set of available states for said client from the set of available states for 
said host, if the result of the comparing step is that the client identifier does not con^spond to the client iden- 
tifiers stored. 

^ 19. A system according to claim 11, characterized in that the analyzing means include means adapted to analyze 
different possible states of macro and/or fomi definitions Included In the data stream for identifying at least one 
response descriptor. 

20. A system according to claim 19, characterized In that the analyzing means include means adapted to execute 
40 the macro file in the system in order to identify at least one response descriptor. 

21. A computer program product stored on a computer readable storage medium, the product being adapted, when 
run on a computer, to perform monitoring of the flow of a data stream between a client and a server, wherein the 
data stream is carrying representation data on a connection-oriented carrier protocol, said monitoring comprising 

45 the steps of: 

analyzing a dala stream traveling from a server to a client in order to Identify at least one response descriptor 
in the data stream, 

in response to the analyzing step, storing response descriptors identified Into a set of available states for said 
so client, 

analyzing a data stream traveling from the client to the server in order to identify at least one request descriptor, 

- comparing said request descriptors with said set of available states for said client and 
in response to the comparing step, generating a monitoring result. 

53 
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